Highly multiplexed imaging has enabled the simultaneous spatial profiling of dozens of biological molecules in tissues at single-cell resolution. Extracting biologically relevant information such as the spatial distribution of cell phenotypes from multiplexed tissue imaging data involves a number of computational tasks, including image segmentation, feature extraction, and spatially-resolved single-cell analysis. Here, we present an end-to-end workflow for multiplexed tissue image processing and analysis, integrating a number of previously developed computational tools to enable these tasks in a user-friendly and customizable fashion. For data quality assessment, we highlight the utility of napari-imc for interactively inspecting raw imaging data and the cytomapper R/Bioconductor package for image visualization in R. Raw data preprocessing, image segmentation and feature extraction are performed using the steinbock toolkit. We showcase two alternative approaches for segmenting cells based on supervised pixel classification and pre-trained deep learning models. The extracted single-cell data is then read, processed and analyzed in R. The protocol describes the use of common data containers, facilitating the application of a number of R/Bioconductor packages for dimensionality reduction, single-cell visualization and phenotyping. We provide instructions to perform spatially-resolved single-cell analysis including community analysis, cellular neighborhood detection and cell-cell interaction testing using the imcRtools R/Bioconductor package. Overall, we provide the protocol for researchers with little bioinformatics training, and data analysis can be completed within 5-6 hours, depending on the segmentation approach. An extended version of the workflow can be accessed at https://bodenmillergroup.github.io/IMCDataAnalysis/.
The example data needed to run the protocol can be downloaded as follows:
options(timeout = 10000)
dir.create("data/steinbock/raw", recursive = TRUE)
## Warning in dir.create("data/steinbock/raw", recursive = TRUE):
## 'data/steinbock/raw' already exists
download.file("https://zenodo.org/record/7412972/files/panel.csv",
"data/steinbock/panel.csv")
download.file("https://zenodo.org/record/5949116/files/Patient1.zip",
"data/steinbock/raw/Patient1.zip")
download.file("https://zenodo.org/record/5949116/files/Patient2.zip",
"data/steinbock/raw/Patient2.zip")
download.file("https://zenodo.org/record/5949116/files/Patient3.zip",
"data/steinbock/raw/Patient3.zip")
download.file("https://zenodo.org/record/5949116/files/Patient4.zip",
"data/steinbock/raw/Patient4.zip")
download.file("https://zenodo.org/record/5949116/files/compensation.zip",
"data/compensation.zip")
unzip("data/compensation.zip", exdir="data", overwrite=TRUE)
unlink("data/compensation.zip")
download.file("https://zenodo.org/record/5949116/files/sample_metadata.xlsx",
destfile = "data/sample_metadata.xlsx")
download.file("https://zenodo.org/record/7432486/files/gated_cells.zip",
"data/gated_cells.zip")
unzip("data/gated_cells.zip", exdir="data", overwrite=TRUE)
unlink("data/gated_cells.zip")
To install all needed R packages, run the following code:
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github(c("BodenmillerGroup/imcRtools",
"BodenmillerGroup/cytomapper"))
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install(c("pheatmap", "viridis",
"tiff", "distill", "openxlsx", "ggrepel", "patchwork",
"mclust", "RColorBrewer", "uwot", "Rtsne", "caret",
"randomForest", "ggridges", "gridGraphics", "scales",
"CATALYST", "scuttle", "scater", "dittoSeq",
"tidyverse", "batchelor", "bluster","scran"))
TODO
alias steinbock="docker run -v /path/to/IMCDataAnalysis/data/steinbock:/data -u $(id -u):$(id -g) ghcr.io/bodenmillergroup/steinbock:0.15.0"
In this protocol, data pre-processing refers to the extraction of multi-channel images from raw imaging data, and to preparing them for downstream processing. The required steps are dependent on the imaging technology; here, we showcase the pre-processing of raw IMC data which includes a hot pixel filtering step.
steinbock preprocess imc images --hpf 50
## bash: no job control in this shell
## 2023-01-19 11:04:53,546 INFO steinbock - img/Patient4_005.tiff
## 2023-01-19 11:04:54,406 INFO steinbock - img/Patient4_006.tiff
## 2023-01-19 11:04:55,216 INFO steinbock - img/Patient4_007.tiff
## 2023-01-19 11:04:55,864 INFO steinbock - img/Patient4_008.tiff
## 2023-01-19 11:05:00,603 INFO steinbock - img/Patient3_001.tiff
## 2023-01-19 11:05:01,251 INFO steinbock - img/Patient3_002.tiff
## 2023-01-19 11:05:01,966 INFO steinbock - img/Patient3_003.tiff
## 2023-01-19 11:05:07,272 INFO steinbock - img/Patient2_001.tiff
## 2023-01-19 11:05:08,039 INFO steinbock - img/Patient2_002.tiff
## 2023-01-19 11:05:08,827 INFO steinbock - img/Patient2_003.tiff
## 2023-01-19 11:05:09,631 INFO steinbock - img/Patient2_004.tiff
## 2023-01-19 11:05:13,329 INFO steinbock - img/Patient1_001.tiff
## 2023-01-19 11:05:14,069 INFO steinbock - img/Patient1_002.tiff
## 2023-01-19 11:05:14,882 INFO steinbock - img/Patient1_003.tiff
## 2023-01-19 11:05:14,955 INFO steinbock - images.csv
The step took 0.57 minutes.
Perform automatic deep learning-enabled single-cell segmentation using the pre-trained Mesmer neural network implemented in DeepCell. In the following command, channels will be min-max-normalized and mean-aggregated according to the deepcell column in the panel file.
steinbock segment deepcell --minmax
## bash: no job control in this shell
## /opt/venv/lib/python3.8/site-packages/deepcell_toolbox/deep_watershed.py:179: FutureWarning: `selem` is a deprecated argument name for `h_maxima`. It will be removed in version 1.0. Please use `footprint` instead.
## markers = h_maxima(image=maxima,
## 2023-01-19 11:05:46,458 INFO steinbock - masks/Patient1_001.tiff
## 2023-01-19 11:05:59,741 INFO steinbock - masks/Patient1_002.tiff
## 2023-01-19 11:06:12,880 INFO steinbock - masks/Patient1_003.tiff
## 2023-01-19 11:06:25,566 INFO steinbock - masks/Patient2_001.tiff
## 2023-01-19 11:06:37,911 INFO steinbock - masks/Patient2_002.tiff
## 2023-01-19 11:06:51,265 INFO steinbock - masks/Patient2_003.tiff
## 2023-01-19 11:07:03,837 INFO steinbock - masks/Patient2_004.tiff
## 2023-01-19 11:07:17,476 INFO steinbock - masks/Patient3_001.tiff
## 2023-01-19 11:07:29,921 INFO steinbock - masks/Patient3_002.tiff
## 2023-01-19 11:07:43,243 INFO steinbock - masks/Patient3_003.tiff
## 2023-01-19 11:07:56,739 INFO steinbock - masks/Patient4_005.tiff
## 2023-01-19 11:08:10,194 INFO steinbock - masks/Patient4_006.tiff
## 2023-01-19 11:08:21,969 INFO steinbock - masks/Patient4_007.tiff
## 2023-01-19 11:08:33,911 INFO steinbock - masks/Patient4_008.tiff
The step took 3.34 minutes.
For each image, extract the mean pixel intensity per cell and marker. The resulting cell-level intensity values are stored as separate CSV files (one file per image):
steinbock measure intensities
## bash: no job control in this shell
## 2023-01-19 11:08:38,573 INFO steinbock - intensities/Patient1_001.csv
## 2023-01-19 11:08:39,535 INFO steinbock - intensities/Patient1_002.csv
## 2023-01-19 11:08:40,633 INFO steinbock - intensities/Patient1_003.csv
## 2023-01-19 11:08:41,717 INFO steinbock - intensities/Patient2_001.csv
## 2023-01-19 11:08:42,675 INFO steinbock - intensities/Patient2_002.csv
## 2023-01-19 11:08:43,554 INFO steinbock - intensities/Patient2_003.csv
## 2023-01-19 11:08:44,705 INFO steinbock - intensities/Patient2_004.csv
## 2023-01-19 11:08:45,819 INFO steinbock - intensities/Patient3_001.csv
## 2023-01-19 11:08:47,775 INFO steinbock - intensities/Patient3_002.csv
## 2023-01-19 11:08:48,819 INFO steinbock - intensities/Patient3_003.csv
## 2023-01-19 11:08:49,799 INFO steinbock - intensities/Patient4_005.csv
## 2023-01-19 11:08:50,795 INFO steinbock - intensities/Patient4_006.csv
## 2023-01-19 11:08:51,670 INFO steinbock - intensities/Patient4_007.csv
## 2023-01-19 11:08:53,039 INFO steinbock - intensities/Patient4_008.csv
The step took 0.29 minutes.
For each image, extract standard morphological features (e.g., area, eccentricity) per cell. The resulting cell-level features are stored as separate CSV files (one file per image):
steinbock measure regionprops
## bash: no job control in this shell
## 2023-01-19 11:08:56,102 INFO steinbock - regionprops/Patient1_001.csv
## 2023-01-19 11:08:57,284 INFO steinbock - regionprops/Patient1_002.csv
## 2023-01-19 11:08:58,549 INFO steinbock - regionprops/Patient1_003.csv
## 2023-01-19 11:08:59,672 INFO steinbock - regionprops/Patient2_001.csv
## 2023-01-19 11:09:00,847 INFO steinbock - regionprops/Patient2_002.csv
## 2023-01-19 11:09:01,812 INFO steinbock - regionprops/Patient2_003.csv
## 2023-01-19 11:09:03,127 INFO steinbock - regionprops/Patient2_004.csv
## 2023-01-19 11:09:04,532 INFO steinbock - regionprops/Patient3_001.csv
## 2023-01-19 11:09:05,714 INFO steinbock - regionprops/Patient3_002.csv
## 2023-01-19 11:09:06,975 INFO steinbock - regionprops/Patient3_003.csv
## 2023-01-19 11:09:08,022 INFO steinbock - regionprops/Patient4_005.csv
## 2023-01-19 11:09:09,487 INFO steinbock - regionprops/Patient4_006.csv
## 2023-01-19 11:09:10,576 INFO steinbock - regionprops/Patient4_007.csv
## 2023-01-19 11:09:11,637 INFO steinbock - regionprops/Patient4_008.csv
The step took 0.31 minutes.
In each image, detect cells in close spatial proximity. The resulting spatial cell graphs are stored as separate directed edge lists in CSV format (one file per image):
steinbock measure neighbors --type expansion --dmax 4
## bash: no job control in this shell
## 2023-01-19 11:09:16,799 INFO steinbock - neighbors/Patient1_001.csv
## 2023-01-19 11:09:19,852 INFO steinbock - neighbors/Patient1_002.csv
## 2023-01-19 11:09:23,404 INFO steinbock - neighbors/Patient1_003.csv
## 2023-01-19 11:09:26,336 INFO steinbock - neighbors/Patient2_001.csv
## 2023-01-19 11:09:29,224 INFO steinbock - neighbors/Patient2_002.csv
## 2023-01-19 11:09:31,628 INFO steinbock - neighbors/Patient2_003.csv
## 2023-01-19 11:09:35,282 INFO steinbock - neighbors/Patient2_004.csv
## 2023-01-19 11:09:38,903 INFO steinbock - neighbors/Patient3_001.csv
## 2023-01-19 11:09:42,087 INFO steinbock - neighbors/Patient3_002.csv
## 2023-01-19 11:09:45,597 INFO steinbock - neighbors/Patient3_003.csv
## 2023-01-19 11:09:48,152 INFO steinbock - neighbors/Patient4_005.csv
## 2023-01-19 11:09:52,139 INFO steinbock - neighbors/Patient4_006.csv
## 2023-01-19 11:09:55,011 INFO steinbock - neighbors/Patient4_007.csv
## 2023-01-19 11:09:57,745 INFO steinbock - neighbors/Patient4_008.csv
The step took 0.77 minutes.
Read in the spatially-resolved single-cell data into R using the imcRtools package. For the rest of the protocol we will continue with the steinbock generated data.
library(imcRtools)
spe <- read_steinbock("data/steinbock/")
The step took 0.57 minutes.
After reading in the single-cell data, the SpatialExperiment object needs to be further processed.
First, the column names are set based on the image name and the cell identifier.
The patient identifier and the region of interest (ROI) identifier are saved in the object as well as the cancer type, which can be read in from the provided data/sample_metadata.xlsx file.
For easy access later on, the channels containing biological variation are selected.
Finally, the mean pixel intensities per channel and cell are arsinh-transformed using a cofactor of 1.
library(openxlsx)
library(tidyverse)
colnames(spe) <- paste0(spe$sample_id, "_", spe$ObjectNumber)
meta <- read.xlsx("data/sample_metadata.xlsx")
spe$patient_id <- as.vector(str_extract_all(spe$sample_id, "Patient[1-4]",
simplify = TRUE))
spe$ROI <- as.vector(str_extract_all(spe$sample_id, "00[1-8]",
simplify = TRUE))
spe$indication <- meta$Indication[match(spe$patient_id, meta$Sample.ID)]
rowData(spe)$use_channel <- !grepl("DNA|Histone", rownames(spe))
assay(spe, "exprs") <- asinh(counts(spe)/1)
The step took 0.04 minutes.
Read in multi-channel images as a CytoImageList container using the cytomapper package.
library(cytomapper)
images <- loadImages("data/steinbock/img/")
channelNames(images) <- rownames(spe)
The step took 0.26 minutes.
Read in segmentation masks as a CytoImageList container.
masks <- loadImages("data/steinbock/masks/", as.is = TRUE)
## All files in the provided location will be read in.
The step took 0 minutes.
For downstream visualization and analysis tasks, additional metadata needs to be added to the CytoImageList objects storing the multi-channel images and segmentation masks. Here, individual images, segmentation masks and entries in the SpatialExperiment object are matched via the sample_id entry.
patient_id <- str_extract_all(names(images), "Patient[1-4]", simplify = TRUE)
indication <- meta$Indication[match(patient_id, meta$Sample.ID)]
mcols(images) <- mcols(masks) <- DataFrame(sample_id = names(images),
patient_id = patient_id,
indication = indication)
The step took 0 minutes.
Low signal spillover between neighbouring channels occurs when using technologies that rely on mass cytometry. Spillover is defined as a small proportion of the signal of a neighbouring channel which can be detected in the primary channel. As spillover is linear to the signal of the neighbouring channel it can be correct for using a previously described compensation approach. This phenomenon is IMC specific and the steps of the following section can be skipped when working with data generated by other multiplexed imaging technologies.
Read in data from the spillover slide for channel-to-channel spillover correction. The experimental procedure to create and acquire a spillover slide can be seen in Supplementary Note 2. As recommended by the CATALYST R/Bioconductor package, the pixel intensities are arsinh-transformed using a cofactor of 5.
sce <- readSCEfromTXT("data/compensation/")
## Spotted channels: Y89, In113, In115, Pr141, Nd142, Nd143, Nd144, Nd145, Nd146, Sm147, Nd148, Sm149, Nd150, Eu151, Sm152, Eu153, Sm154, Gd155, Gd156, Gd158, Tb159, Gd160, Dy161, Dy162, Dy163, Dy164, Ho165, Er166, Er167, Er168, Tm169, Er170, Yb171, Yb172, Yb173, Yb174, Lu175, Yb176
## Acquired channels: Ar80, Y89, In113, In115, Xe131, Xe134, Ba136, La138, Pr141, Nd142, Nd143, Nd144, Nd145, Nd146, Sm147, Nd148, Sm149, Nd150, Eu151, Sm152, Eu153, Sm154, Gd155, Gd156, Gd158, Tb159, Gd160, Dy161, Dy162, Dy163, Dy164, Ho165, Er166, Er167, Er168, Tm169, Er170, Yb171, Yb172, Yb173, Yb174, Lu175, Yb176, Ir191, Ir193, Pt196, Pb206
## Channels spotted but not acquired:
## Channels acquired but not spotted: Ar80, Xe131, Xe134, Ba136, La138, Ir191, Ir193, Pt196, Pb206
assay(sce, "exprs") <- asinh(counts(sce)/5)
The step took 0.11 minutes.
Perform quality assessment of the spillover data by visualizing the median pixel intensity per channel and spotted metal.
plotSpotHeatmap(sce)
(optional) Perform pixel binning to increase median pixel intensity. This is only needed if pixel intensities are too low (median below ~200 counts).
sce2 <- binAcrossPixels(sce, bin_size = 10)
The step took 0.2 minutes.
Filter incorrectly assigned pixels. The following step uses functions provided by the CATALYST package to “de-barcode” the pixels. Based on the intensity distribution of all channels, pixels are assigned to their corresponding barcode; here, this is the already known metal spot. This procedure identifies pixels that cannot be robustly assigned to the spotted metal. Pixels of such kind can be regarded as “noisy”, “background”, or “artifacts” that should be removed prior to spillover estimation. The spotted channels (bc_key) need to be specified. The general workflow for pixel de-barcoding is as follows:
In cases where incorrect assignments occurred or where few pixels were measured for some spots, the imcRtools package exports a helper function to exclude pixels.
library(CATALYST)
bc_key <- as.numeric(unique(sce$sample_mass))
bc_key <- bc_key[order(bc_key)]
sce <- assignPrelim(sce, bc_key = bc_key)
## Debarcoding data...
## o ordering
## o classifying events
## Normalizing...
## Computing deltas...
sce <- estCutoffs(sce)
sce <- applyCutoffs(sce)
sce <- filterPixels(sce, minevents = 40, correct_pixels = TRUE)
The step took 0.14 minutes.
Compute and store the spillover matrix using the CATALYST package.
sce <- computeSpillmat(sce)
sm <- metadata(sce)$spillover_matrix
The step took 0.03 minutes.
Perform single-cell data compensation using the CATALYST package. The compCytof function corrects channel-to-channel spillover directly on the single-cell intensities using the previously estimated spillover matrix. The isotope_list variable needs to be extended by isotopes that are not contained in this list provided by the CATALYST package. Visualization of marker intensities of neighboring channels (e.g., Yb173 and Yb174) before and after correction can be used to assess the spillover correction efficacy.
library(dittoSeq)
library(patchwork)
rowData(spe)$channel_name <- paste0(rowData(spe)$channel, "Di")
isotope_list <- CATALYST::isotope_list
isotope_list$Ar <- 80
spe <- compCytof(spe, sm,
transform = TRUE, cofactor = 1,
isotope_list = isotope_list,
overwrite = FALSE)
before <- dittoScatterPlot(spe, x.var = "Ecad", y.var = "CD303",
assay.x = "exprs", assay.y = "exprs") +
ggtitle("Before compensation")
after <- dittoScatterPlot(spe, x.var = "Ecad", y.var = "CD303",
assay.x = "compexprs", assay.y = "compexprs") +
ggtitle("After compensation")
before + after
assay(spe, "counts") <- assay(spe, "compcounts")
assay(spe, "exprs") <- assay(spe, "compexprs")
assay(spe, "compcounts") <- assay(spe, "compexprs") <- NULL
The step took 0.2 minutes.
Perform channel-to-channel spillover correction on multi-channel images. To this end, the previously computed spillover matrix needs to be adjusted to only retain channels that are stored in the multi-channel images. By visualizing neighboring channels, spillover correction efficacy can be assessed.
channelNames(images) <- rowData(spe)$channel_name
adapted_sm <- adaptSpillmat(sm, paste0(rowData(spe)$channel, "Di"),
isotope_list = isotope_list)
## Compensation is likely to be inaccurate.
## Spill values for the following interactions
## have not been estimated:
## Ir191Di -> Ir193Di
## Ir193Di -> Ir191Di
images_comp <- compImage(images, adapted_sm)
plotPixels(images[5], colour_by = "Yb173Di",
image_title = list(text = "Yb173 (Ecad) - before",
position = "topleft"),
legend = NULL, bcg = list(Yb173Di = c(0, 4, 1)))
plotPixels(images[5], colour_by = "Yb174Di",
image_title = list(text = "Yb174 (CD303) - before",
position = "topleft"),
legend = NULL, bcg = list(Yb174Di = c(0, 4, 1)))
plotPixels(images_comp[5], colour_by = "Yb173Di",
image_title = list(text = "Yb173 (Ecad) - after",
position = "topleft"),
legend = NULL, bcg = list(Yb173Di = c(0, 4, 1)))
plotPixels(images_comp[5], colour_by = "Yb174Di",
image_title = list(text = "Yb174 (CD303) - after",
position = "topleft"),
legend = NULL, bcg = list(Yb174Di = c(0, 4, 1)))
channelNames(images_comp) <- rownames(spe)
The step took 9.59 minutes.
Outline cells on composite images for visual assessment of segmentation quality. For visualization purposes, we subset 3 images and outline all cells on composite images after channel normalization.
set.seed(20220118)
img_ids <- sample(seq_len(length(images_comp)), 3)
cur_images <- images_comp[img_ids]
cur_images <- normalize(cur_images, separateImages = TRUE)
cur_images <- normalize(cur_images, inputRange = c(0, 0.2))
plotPixels(cur_images,
mask = masks[img_ids],
img_id = "sample_id",
missing_colour = "white",
colour_by = c("CD163", "CD20", "CD3", "Ecad", "DNA1"),
colour = list(CD163 = c("black", "yellow"),
CD20 = c("black", "red"),
CD3 = c("black", "green"),
Ecad = c("black", "cyan"),
DNA1 = c("black", "blue")),
image_title = NULL,
legend = list(colour_by.title.cex = 0.9,
colour_by.labels.cex = 0.9))